Effects of Inconsistent Relevance Judgments on Information Retrieval Test Results: A Historical Perspective1
نویسنده
چکیده
The main objective of information retrieval (IR) systems is to retrieve information or information objects relevant to user requests and possible needs. In IR tests, retrieval effectiveness is established by comparing IR systems retrievals (systems relevance) with users’ or user surrogates’ assessments (user relevance), where user relevance is treated as the gold standard for performance evaluation. Relevance is a human notion, and establishing relevance by humans is fraught with a number of problems—inconsistency in judgment being one of them. The aim of this critical review is to explore the relationship between relevance on the one hand and testing of IR systems and procedures on the other. Critics of IR tests raised the issue of validity of the IR tests because they were based on relevance judgments that are inconsistent. This review traces and synthesizes experimental studies dealing with (1) inconsistency of relevance judgments by people, (2) effects of such inconsistency on results of IR tests and (3) reasons for retrieval failures. A historical context for these studies and for IR testing is provided including an assessment of Lancaster’s (1969) evaluation of MEDLARS and its unique place in the history of IR evaluation. LIBRARY TRENDS, Vol. 56, No. 4, Spring 2008 (“The Evaluation and Transformation of Information Systems: Essays Honoring the Legacy of F. W. Lancaster,” edited by Lorraine J. Haricombe and Keith Russell), pp. 763–783 (c) 2008 The Board of Trustees, University of Illinois Effects of Inconsistent Relevance Judgments on Information Retrieval Test Results: A Historical Perspective1
منابع مشابه
Effects of Inconsistent Relevance Judgments on Information Retrieval Test Results: A Historical Perspective
The main objective of information retrieval (IR) systems is to retrieve information or information objects relevant to user requests and possible needs. In IR tests, retrieval effectiveness is established by comparing IR systems retrievals (systems relevance) with users’ or user surrogates’ assessments (user relevance), where user relevance is treated as the gold standard for performance evalua...
متن کاملThe Role of the FUM Students' Demographic Features in the Relevance Judgment Scores of Their Information Retrieval Results in Search Engines
In order to design user-friendly information retrieval systems, it is important to pay attention to characteristics of users. Therefore, the aim of the present study is to investigate the role of demographic variables of users during their search in search engines. Method: This is an applied study in terms of purpose, which was done by the evaluation method. To conduct the research, firstly,...
متن کاملAccuracy, Agreement, Speed, and Perceived Difficulty of Users’ Relevance Judgments for E-Discovery
This paper presents a study in which four law students and four Library and Information Science (LIS) students judged independently the relevance of documents selected from the e-discovery test collections of the Text REtrieval Conference. The results were compared with the official relevance ground truth and among participants. Given the same task guidelines and minimal training, on average th...
متن کاملMatching Scores of System Relevance and User-Oriented Relevance in SID, ISC and Google Scholar
Background and Aim: The main aim of Information storage and retrieval systems is keeping and retrieving the related information means providing the related documents with users’ needs or requests. This study aimed to answer this question that how much are the system relevance and User- Oriented relevance are matched in SID, SCI and Google Scholar databases. Method: In this study 15 keywords of ...
متن کاملUsing Multiple Query Aspects to Build Test Collections without Human Relevance Judgments
Collecting relevance judgments (qrels) is an especially challenging part of building an information retrieval test collection. This paper presents a novel method for creating test collections by offering a substitute for relevance judgments. Our method is based on an old idea in IR: a single information need can be represented by many query articulations. We call different articulations of a pa...
متن کامل